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Abstract 

This  paper  addresses  the  problem  of  recovering  relative  structure,  in  the 
form  of  an  invariant,  from  two  views  of  a  3D  scene.  The  invariant  structure 
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the  existence  of  perspective  distortions  in  the  input  images. 

We  show  that,  given  the  location  of  epipoles,  the  projective  structure 
invariant  can  be  constructed  from  only  four  corresponding  points  projected 
from  four  non-coplanar  points  in  space  (like  in  the  case  of  parallel  projec¬ 
tion).  This  result  leads  to  two  algorithms  for  computing  projective  structure. 
The  first  algorithm  requires  six  corresponding  points,  four  of  which  are  as¬ 
sumed  to  be  projected  from  four  coplanar  points  in  space.  Alternatively,  the 
second  algorithm  requires  eight  corresponding  points,  without  assumptions 
of  coplanarity  of  object  points. 

Our  study  of  projective  structure  is  applicable  to  both  structure  from 
motion  and  visual  recognition.  We  use  projective  structure  to  re-project  the 
3D  scene  from  two  model  images  and  six  or  eight  corresponding  points  with 
a  novel  view  of  the  scene.  The  re-projection  process  is  well-defined  under 
all  cases  of  central  projection,  including  the  case  of  parallel  projection. 
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1  Introduction 

The  problem  we  address  in  this  paper  is  that  of  recover¬ 
ing  relative,  non-metric,  structure  of  a  three-dimensional 
scene  from  two  images,  taken  from  different  viewing  po¬ 
sitions.  The  relative  structure  information  is  in  the  form 
of  an  invariant  that  can  be  computed  without  any  prior 
knowledge  of  camera  geometry,  and  under  all  central  pro¬ 
jections  —  including  the  case  of  parallel  projection.  The 
non-metric  nature  of  the  invariant  allows  the  cameras  to 
be  internally  uncalibrated  (intrinsic  parameters  of  cam¬ 
era  are  unknown).  The  unique  nature  of  the  invariant  al¬ 
lows  the  system  to  make  no  assumptions  about  existence 
of  perspective  distortions  in  the  input  images.  Therefore, 
any  degree  of  perspective  distortions  is  allowed,  i.e.,  or¬ 
thographic  and  perspective  projections  are  treated  alike, 
or  in  other  words,  no  assumptions  are  made  on  the  size 
of  field  of  view. 

We  envision  this  study  as  having  applications  both  in 
the  Mea  of  structure  from  motion  and  in  the  area  of 
visual  recognition.  In  structure  from  motion  our  contri¬ 
bution  is  an  addition  to  the  recent  studies  of  non-metric 
structure  from  motion  pioneered  by  Koenderink  and  Van 
Doom  (1991)  in  parallel  projection,  followed  by  Faugeras 
(1992)  and  Mohr,  Quan,  Veillon  &  Boufama  (1992)  for 
reconstructing  the  projective  coordinates  of  a  scene  up 
to  an  unknown  projective  transformation  of  3D  projec¬ 
tive  space.  Our  approach  is  similar  to  Koenderink  and 
Van  Doom’s  in  the  sense  that  we  derive  an  invariant, 
based  on  a  geometric  construction,  that  records  the  3D 
structure  of  the  scene  as  a  variation  from  two  fixed  ref¬ 
erence  planes  measured  along  the  line  of  sight.  Unlike 
Faugeras  and  Mohr  et  al.  we  do  not  recover  the  projec¬ 
tive  coordinates  of  the  scene,  and,  as  a  result,  we  use  a 
smaller  number  of  co-responding  points;  in  addition  to 
the  location  of  epipoles  we  need  only  four  correspond¬ 
ing  points,  coming  from  four  non-coplanar  points  in  the 
scene,  whereas  Faugeras  and  Mohr  et  al.  require  corre¬ 
spondences  coming  from  five  points  in  general  position. 

The  second  contribution  of  our  study  is  to  visual  recog¬ 
nition  of  3D  objects  from  2D  images.  We  show  that  our 
projective  invariant  can  be  used  to  predict  novel  views  of 
the  object,  given  two  model  views  in  full  correspondence 
and  a  small  number  of  corresponding  points  with  the 
novel  view.  The  predicted  view  is  then  matched  against 
the  novel  input  view,  and  if  the  two  match,  then  the 
novel  view  is  considered  to  be  an  instance  of  the  same  ob¬ 
ject  that  gave  rise  to  the  two  model  views  stored  in  mem¬ 
ory.  This  paradigm  of  recognition  is  within  the  general 
framework  of  alignment  (Fischler  and  Holies  1981 ,  Lowe 
1985,  Ullman  1989,  Hutteniocher  and  Ullman  1987)  and, 
more  specifically,  of  the  paradigm  proposed  by  Ullman 
and  Basri  (1989)  that  recognition  can  proceed  using  only 
2D  images,  both  for  representing  the  model,  and  when 
matching  the  model  to  the  input  image.  We  refer  to  the 
problem  of  predicting  a  novel  view  from  a  set  of  model 
views  using  a  limited  number  of  corresponding  points, 
as  the  problem  of  re-projection. 

The  problem  of  re-projection  has  been  dealt  with  in 
the  past  primarily  assuming  parallel  projection  (Ull¬ 
man  and  Basri  1989,  Koenderink  and  Van  Doom  1991). 
For  the  more  general  case  of  central  projection,  Barret, 


Brill,  Haag  &  Pyton  (1991)  have  recently  introduced  a 
quadratic  invariant  based  on  the  fundamental  matrix  of 
Longuet-Higgins  (1981),  which  is  computed  from  eight 
corresponding  points.  In  Appendix  E  we  show  that 
their  result  is  equivalent  to  intersecting  epipolar  lines, 
and  therefore,  is  singular  for  certain  viewing  transfor¬ 
mations  depending  on  the  viewing  geometry  between  the 
two  model  views.  Our  projective  invariant  is  not  based 
on  an  epipolar  intersection,  but  is  based  directly  on  the 
relative  structure  of  the  object,  and  does  not  suffer  from 
any  singularities,  a  finding  that  implies  greater  stability 
in  the  presence  of  errors. 

The  projective  structure  invariant,  and  the  re¬ 
projection  method  that  follows,  is  based  on  an  exten¬ 
sion  of  Koenderink  and  Van-Doorn’s  representation  of 
affine  structure  as  an  invariant  defined  with  respect  to 
a  reference  plane  and  a  reference  point.  We  start  by  in¬ 
troducing  an  alternative  affine  invariant,  using  two  ref¬ 
erence  planes  (section  5),  and  it  can  easily  be  extended 
to  projective  space.  As  a  result  we  obtain  a  projective 
structure  invariant  (section  6). 

We  show  that  the  difference  between  the  affine  and 
projective  case  lie  entirely  in  the  location  of  the  epipoles, 
i.e.,  given  the  location  of  epipoles  both  the  affine  and 
projective  structures  are  constructed  by  linear  methods 
using  the  information  captured  from  four  corresponding 
points  projected  from  four  non-coplanar  points  in  space. 
In  the  projective  case  we  need  additional  corresponding 
points  —  solely  for  the  purpose  of  recovering  the  location 
of  the  epipoles  (Theorem  1,  section  6). 

We  show  that  the  projective  structure  invariant  can 
be  recovered  from  two  views  —  produced  by  parallel  or 
central  projection  —  and  six  corresponding  points,  four 
of  which  are  assumed  to  be  projected  from  four  coplanar 
points  in  space  (section  7.1).  Alternatively,  the  projec¬ 
tive  structure  can  be  recovered  from  eight  corresponding 
points,  without  assuming  coplanarity  of  object  points 
(section  8.1).  The  8-point  method  uses  the  fundamental 
matrix  approach  (Longuett-Higgins,  1981)  for  recover¬ 
ing  the  location  of  epipoles  (as  suggested  by  Faugeras, 
1992). 

Finally,  we  show  that,  for  both  schemes,  it  is  possible 
to  limit  the  viewing  transformations  to  the  group  of  rigid 
motions,  i.e.,  it  is  possible  to  work  with  perspective  pro¬ 
jection  assuming  the  cameras  are  calibrated.  The  result, 
however,  does  not  include  orthographic  projection. 

Experiments  were  conducted  with  both  algorithms, 
and  the  results  show  that  the  6-point  algorithm  is  sta- 
bie  under  noise  and  under  conditions  that  violate  the 
assumption  that  four  object  points  are  coplanar.  The  8- 
point  algorithm,  although  theoretically  superior  because 
of  lack  of  the  coplanarity  assumption,  is  considerably 
more  sensitive  to  noise. 

2  Why  not  Classical  SFM? 

The  work  of  Koenderink  and  Van  Doom  (1991)  on  affine 
structure  from  two  orthographic  views,  and  the  work  of 
Ullman  and  Basri  (1989)  on  re-projection  from  two  or¬ 
thographic  views,  have  a  clear  practical  aspect:  it  is 
known  that  at  least  three  orthographic  views  are  re¬ 
quired  to  recover  metric  structure,  i.e.,  relative  depth 


(Ullman  1979,  Huang  k.  Lee  1989,  Aloimonos  k  Brown 
1989).  Therefore,  the  suggestion  to  use  affine  structure 
instead  of  metric  structure  allows  a  recognition  system 
to  perform  re-projection  from  two-model  views  (Ullman 
k  Basri),  and  to  generate  novel  views  of  the  object  pro¬ 
duced  by  affine  transformations  in  space,  rather  than  by 
rigid  tr^lnsformations  (Koenderink  k  Van  Doom). 

This  advantage,  of  working  with  two  rather  than  three 
views,  is  not  present  under  perspective  projection,  how¬ 
ever.  It  is  known  that  two  perspective  views  are  sufficient 
for  recovering  metric  structure  (Roach  k  Aggarwal  1979, 
Longuett-Higgins  1981,  Tsai  k  Huang  1984,  Faugeras 
Maybank  1990).  The  question,  therefore,  is  why  look  for 
alternative  representations  of  structure,  and  new  meth¬ 
ods  for  performing  re-projection? 

There  are  three  major  problems  in  structure  from  mo¬ 
tion  methods:  (i)  cnticd  dependence  on  an  orthographic 
or  perspective  model  of  projection,  (ii)  internal  camera 
calibration,  and  (iii)  the  problem  of  stereo-triangulation. 

The  first  problem  is  the  strict  division  between  meth¬ 
ods  that  assume  orthographic  projection  and  methods 
that  assume  perspective  projection.  These  two  classes 
of  methods  do  not  overlap  in  their  domain  of  applica^ 
tion.  The  perspective  model  operates  under  conditions 
of  significant  perspective  distortions,  such  as  driving  on 
a  stretch  of  highway,  requires  a  relatively  large  field  of 
view  and  relatively  large  depth  variations  between  scene 
points  (Adiv  1989,  Dutta  k  Synder  1990,  Tomasi  1991, 
Broida  et  al  1990).  The  orthographic  model,  on  the 
other  hand,  provides  a  reasonable  approximation  when 
the  imaging  situation  is  at  the  other  extreme,  i.e.,  small 
field  of  view  and  small  depth  variation  between  object 
points  (a  situation  for  which  perspective  schemes  often 
break  down).  Typical  imaging  situations  are  at  neither 
end  of  these  extremes  and,  therefore,  would  be  vulner¬ 
able  to  errors  in  both  models.  From  the  standpoint  of 
performing  recognition,  this  problem  implies  that  the 
viewer  has  control  over  his  field  of  view  —  a  property 
that  may  be  reasonable  to  assume  at  the  time  of  model 
acquisition,  but  less  reasonable  to  assume  occurring  at 
recognition  time. 

The  second  problem  is  related  to  internal  camera  cal¬ 
ibration.  The  assumption  of  perspective  projection  in¬ 
cludes  a  distinguishable  point,  known  as  the  principal 
point,  which  is  at  the  intersection  of  the  optical  axis  and 
the  image  plane.  The  location  of  the  principal  point  is 
an  internal  parameter  of  the  camera,  which  may  deviate 
somewhat  from  the  geometric  center  of  the  image  plane, 
and  therefore,  may  require  calibration.  Perspective  pro¬ 
jection  also  assumes  that  the  image  plane  is  perpendicu¬ 
lar  to  the  optical  axis  and  the  possibility  of  imperfections 
in  the  camera  requires,  therefore,  the  recovery  of  the  two 
axes  describing  the  image  frame,  and  of  the  focal  length. 
Although  the  calibration  process  is  somewhat  tedious,  it 
is  sometimes  necessary  for  many  of  the  available  com¬ 
mercial  cameras  (Brown  1971,  Faig  1975,  Lenz  and  Tsai 
1987,  Faugeras,  Luong  and  Maybank  1992).  The  prob¬ 
lem  of  calibration  is  lesser  under  orthographic  projection 
because  the  projection  does  not  have  a  distinguishable 
ray;  therefore  any  point  can  serve  as  an  origin,  however 
must  still  be  considered  because  of  the  assumption  that 


the  image  plane  is  perpendicular  to  the  projecting  rays. 

The  third  problem  is  related  to  the  way  shape  is 
typically  represented  under  the  perspective  projection 
model.  Because  the  center  of  projection  is  also  the  ori¬ 
gin  of  the  coordinate  system  for  describing  shape,  the 
shape  difference  (e.g.,  difference  in  depth,  between  two 
object  points),  is  orders  of  magnitude  smaller  than  the 
distance  to  the  scene,  aind  this  makes  the  computations 
very  sensitive  to  noise.  The  sensitivity  to  noise  is  re¬ 
duced  if  images  are  taken  from  distant  viewpoints  (large 
base-line  in  stereo  triangulation),  but  that  makes  the 
process  of  establishing  correspondence  between  points  in 
both  views  more  of  a  problem,  and  hence,  may  make  the 
situation  even  worse.  This  problem  does  not  occur  un¬ 
der  the  assumption  of  orthographic  projection  because 
translation  in  depth  is  lost  under  orthographic  projec¬ 
tion,  and  therefore,  the  origin  of  the  coordinate  system 
for  describing  shape  (metric  and  non-metric)  is  object- 
centered,  rather  than  viewer-centered  (Tomasi,  1991). 

These  problems,  in  isolation  or  put  together,  make 
much  of  the  reason  for  the  sensitivity  of  structure  from 
motion  methods  to  errors.  The  recent  work  of  Faugeras 
(1992)  and  Mohr  et  al.  (1992)  addresses  the  problem  of 
internal  calibration  by  assuming  central  projection  in¬ 
stead  of  perspective  projection.  Faugeras  and  Mohr  et 
al.  then  proceed  to  reconstruct  the  projective  coordi¬ 
nates  of  the  scene.  Since  projective  coordinates  are  mear 
sured  relative  to  the  center  of  projection,  this  approach 
does  not  address  the  problem  of  stereo-triangulation  or 
the  problem  of  uniformity  under  both  orthographic  and 
perspective  projection  models. 

3  Camera  Model  and  Notations 

We  assume  that  objects  in  the  world  are  rigid  and  are 
viewed  under  central  projection.  In  central  projection 
the  center  of  projection  is  the  origin  of  the  camera  coor¬ 
dinate  frame  and  can  be  located  anywhere  in  projective 
space.  In  other  words,  the  center  of  projection  can  be 
a  point  in  Euclidean  space  or  an  ideal  point  (such  as 
happens  in  parallel  projection).  The  image  plane  is  as¬ 
sumed  to  be  arbitrarily  positioned  with  respect  to  the 
camera  coordinate  frame  (unlike  perspe':tive  projection 
where  it  is  parallel  to  the  xy  plane).  We  refer  to  this  as  a 
non-rigid  camera  configuration.  The  motion  of  the  cam¬ 
era,  therefore,  consists  of  the  translation  of  the  center  of 
projection,  rotation  of  the  coordinate  frame  around  the 
new  location  of  the  center  of  projection,  and  followed  by 
tilt,  pan,  and  focal  length  scale  of  the  image  plane  with 
respect  to  the  new  optical  axis.  This  model  of  projection 
will  also  be  referred  to  as  perspective  projection  with  an 
uncalibrated  camera. 

We  also  include  in  our  derivations  the  possibility  of 
having  a  rigid  camera  configuration.  A  rigid  camera  is 
simply  the  familiar  model  of  perspective  projection  in 
which  the  center  of  projection  is  a  point  in  Euclidean 
space  and  the  image  plane  is  fixed  with  respect  to  the 
camera  coordinate  frame.  A  rigid  camera  motion,  there¬ 
fore,  consists  of  translation  of  the  center  of  projection 
followed  by  rotation  of  the  coordinate  frame  and  focal 
length  scaling.  Note  that  a  rigid  camera  implicitly  as¬ 
sumes  internal  calibration,  i.e.,  the  optical  axis  pierces 
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Figure  1:  Koenderink  and  Van  Doom’s  Affine  Structure. 

through  a  fixed  point  in  the  image  and  the  image  plane 
is  perpendicular  to  the  optical  axis. 

We  denote  object  points  in  capital  letters  and  image 
points  in  small  letters.  If  P  denotes  an  object  point  in  3D 
space,  denote  its  projections  onto  the  first,  sec¬ 

ond  and  novel  projections,  respectively.  We  treat  image 
points  as  rays  (homogeneous  coordinates)  in  3D  space, 
and  refer  to  the  notation  p  =  (z,j/,l)  as  the  standard 
representation  of  the  image  plane.  We  note  that  the 
true  coordinates  of  the  image  plane  are  related  to  the 
standard  representation  by  means  of  a  projective  trans¬ 
formation  of  the  plane.  In  case  we  deal  with  central 
projection,  all  representations  of  image  coordinates  are 
allowed,  and  therefore,  without  loss  of  generality  we  work 
with  the  standard  representation  (more  on  that  in  Ap¬ 
pendix  A). 

4  Affine  Structure:  Koenderink  and 
Van  Doom’s  Version 

The  affine  structure  invariant  described  by  Koenderink 
^uld  Van  Doom  (1991)  is  based  on  a  geometric  con¬ 
struction  using  a  single  reference  plane,  and  a  reference 
point  not  coplanar  with  the  reference  plane.  In  affine 
geometry  (induced  by  parallel  projection),  it  is  known 
from  the  fundamental  theorem  of  plane  projectivity,  that 
three  (non-collinear)  corresponding  points  are  sufficient 
to  uniquely  determine  all  other  correspondences  (see  Ap¬ 
pendix  A  for  more  details  on  plane  projectivity  under 
affine  and  projective  geometry).  Using  three  correspond¬ 
ing  points  between  two  views  provides  us,  therefore,  with 
a  transformation  (affine  transformation)  for  determining 
the  location  of  all  points  of  the  plane  passing  through 
the  three  reference  points  in  the  second  image  plane. 

Let  P  be  an  arbitrary  point  in  the  scene  projecting 
onto  p,p'  on  the  two  image  planes.  Let  P  be  the  projec¬ 
tion  of  P  onto  the  reference  plane  along  the  ray  towards 
the  first  image  plane,  and  let  pf  be  the  projection  of  P 
onto  the  second  image  plane  (p'  and  pf  coincide  if  P  is 


on  the  reference  plane).  Note  that  the  location  of  p'  is 
known  via  the  affine  transformation  determined  by  the 
projections  of  the  three  reference  points.  Finally,  let 
Q  be  the  fourth  reference  point  (not  on  the  reference 
plane).  Using  a  simple  geometric  drawing,  the  affine 
structure  invariant  is  derived  as  follows. 

Consider  Figure  1.  The  projections  of  the  reference 
point  Q  and  an  arbitrary  point  of  interest  P  form  two 
similar  trapezoids;  PPp'pf  and  QQq'q'.  From  similarity 
of  trapezoids  we  have, 

\P-P\  ^  Ip^-p^I 
IC-QI  l9'-9T 

By  assuming  that  q,  q'  is  a  given  corresponding  point,  we 
obtain  a  shape  invariant  that  is  invariant  under  penallel 
projection  (the  object  points  are  fixed  while  the  camera 
changes  the  location  and  position  of  the  image  plane 
towards  the  projecting  rays) . 

Before  we  extend  this  result  to  central  projection  by 
using  projective  geometry,  we  first  describe  a  different 
affine  invariant  using  two  reference  planes,  rather  than 
one  reference  plane  and  a  reference  point.  The  new  affine 
invariant  is  the  one  that  will  be  applied  later  to  central 
projection. 

5  Affine  Structure  Using  Two 
Reference  Planes 

We  make  use  of  the  same  information  —  the  projections 
of  four  non-coplanar  points  —  to  set  up  two  reference 
planes.  Let  P^,  j  =  1,...,4,  be  the  four  non-coplanar 
reference  points  in  space,  and  let  pj  « — ►  p'  be  their  ob¬ 
served  projections  in  both  views.  The  points  PjjPj.Ps 
and  P2,  Ps,  P4  lie  on  two  different  planes,  therefore,  we 
can  account  for  the  motion  of  all  points  coplanar  with 
each  of  these  two  planes.  Let  P  be  a  point  of  interest, 
not  coplanar  with  either  of  the  reference  planes,  and  let 
P  and  P  be  its  projections  onto  the  two  reference  planes 
along  the  ray  towards  the  first  view. 

Consider  Figure  2.  The  projection  of  P,  P  and  P  onto 
p'.p'  and  p  respectively,  gives  rise  to  two  similar  trape¬ 
zoids  from  which  we  derive  the  following  relation: 

a  _  |P-P| 

"  |P-P|  \P-P\' 

The  ratio  Op  is  invariant  under  parallel  projection.  There 
is  no  particular  advantage  for  preferring  Op  over  7p  as 
a  measure  of  affine  structure,  but  as  will  be  described 
below,  this  new  construction  forms  the  basis  for  extend¬ 
ing  affine  structure  to  projective  structure,  whereas  the 
single  reference  plane  construction  does  not  (see  Ap¬ 
pendix  D  for  proof). 

In  the  projective  plane,  we  need  four  coplanar  points 
to  determine  the  motion  of  a  reference  plane.  We  show 
that,  given  the  epipoles,  only  three  corresponding  points 
for  each  reference  plane  are  sufficient  for  recovering  the 
associated  projective  transformations  induced  by  those 
planes.  Altogether,  the  construction  provides  us  with 
four  points  along  each  epipolar  line.  The  similarity  of 
trapezoids  in  the  affine  case  turns,  therefore,  into  a  cross¬ 
ratio  in  the  projective  case. 


p 


Figure  2:  Affine  structure  using  two  reference  planes. 

This  leads  to  the  result  (Theorem  1)  that,  in  addition 
to  the  epipoles,  only  four  corresponding  points,  projected 
from  four  non-coplsuiar  points  in  the  scene,  are  sufficient 
for  recovering  the  projective  structure  invariant  for  all 
other  points.  The  epipoles  can  be  recovered  by  either 
extending  the  Koenderink  and  Van  Doom  (1991)  con¬ 
struction  to  projective  space  using  six  points  (four  of 
which  are  assumed  to  be  coplanar),  or  by  using  other 
methods,  notably  those  based  on  the  Longuet-Higgins 
fundamental  matrix.  This  leads  to  projective  structure 
from  eight  points  in  general  position. 

6  Projective  Structure 

We  assume  for  now  that  the  location  of  both  epipoles  is 
known,  and  we  will  address  the  problem  of  finding  the 
epipoles  later.  The  epipoles,  also  known  as  the  foci  of  ex¬ 
pansion,  are  the  intersections  of  the  line  in  space  connect¬ 
ing  the  two  centers  of  projection  and  the  image  planes. 
There  are  two  epipoles,  one  on  each  image  plane  —  the 
epipole  on  the  second  image  we  call  the  left  epipole,  and 
the  epipole  on  the  first  image  we  call  the  right  epipole. 
The  image  lines  emanating  from  the  epipoles  are  known 
as  the  epipolar  lines. 

Consider  Figure  3  which  illustrates  the  two  reference 
plane  construction,  defined  earlier  for  parallel  projection, 
now  displayed  in  the  case  of  central  projection.  The 
left  epipole  is  denoted  by  Vj,  and  because  it  is  on  the 
line  V1V2  (connecting  the  two  centers  of  projection),  the 
line  PVi  projects  onto  the  epipolar  line  p'V/.  Therefore, 
the  points  P  and  P  project  onto  the  points  p  and  p , 
which  are  both  on  the  epipolar  line  pVi.  The  points 
p'lp'tp  and  Vi  are  collinear  and  projectively  related  to 
P,  P,  P,  Vi ,  and  therefore  have  the  same  cross-ratio; 

_  \P-P\  jvwi  =  IPlzfi 

Ip-pI  'iv.-pi  \p-p\  ' \v,-py 

Note  that  when  the  epipole  V)  becomes  an  ideal  point 
(vanishes  along  the  epipolar  line),  then  ap  is  the  same 


Figure  3:  Definition  of  projective  shape  as  the  cross  ratio 
o{p,p,p,Vi. 

as  the  affine  invariant  defined  in  section  5  for  parallel 
projection. 

The  cross-ratio  Op  is  a  direct  extension  of  the  affine 
structure  invariant  defined  in  section  5  and  is  referred 
to  as  projective  structure.  We  can  use  this  invariant  to 
reconstruct  any  novel  view  of  the  object  (taken  by  a 
non-rigid  camera)  without  ever  recovering  depth  or  even 
projective  coordinates  of  the  object. 

Having  defined  the  projective  shape  invariant,  and  as¬ 
suming  we  still  are  given  the  locations  of  the  epipoles, 
we  show  next  how  to  recover  the  projections  of  the  two 
reference  planes  onto  the  second  image  plane,  i.e.,  we 
describe  the  computations  leading  to  p  and  p. 

Since  we  are  working  under  central  projection,  we 
need  to  identify  four  coplanar  points  on  each  reference 
plane.  In  other  words,  in  the  projective  geometry  of  the 
plane,  four  corresponding  points,  no  three  of  which  are 
collinear,  are  sufficient  to  determine  uniquely  all  other 
correspondences  (see  Appendix  A,  for  more  details).  We 
must,  therefore,  identify  four  corresponding  points  that 
are  projected  from  four  coplaneu’  points  in  space,  and 
then  recover  the  projective  transformation  that  accounts 
for  all  other  correspondences  induced  from  that  plane. 
The  following  proposition  states  that  the  corresponding 
epipoles  C2ui  be  used  as  a  fourth  corresponding  point  for 
any  three  corresponding  points  selected  from  the  pair  of 
images. 

Proposition  1  A  projective  transformation,  A,  that  is 
determined  from  three  arbitrary,  non-collinear,  corre¬ 
sponding  points  and  the  corresponding  epipoles,  is  a  pro¬ 
jective  transformation  of  the  plane  passing  through  Vie 
three  object  points  which  project  onto  the  correspond¬ 
ing  image  points.  The  transformation  A  is  an  induced 
epipolar  transformation,  i.e.,  the  ray  Ap  intersects  the 
epipolar  line  pVi  for  any  arbitrary  image  point  p  and  its 
corresponding  point  p . 

Conunent:  An  epipolar  transformation  F  is  a  mapping 
between  corresponding  epipolar  lines  and  is  determined 
(not  uniquely)  from  three  corresponding  epipolar  lines 
and  the  epipoles.  The  induced  point  transformation  is 
E  =  (F~*)*  (induced  from  the  point/line  duality  of  pro- 


jective  geometry,  see  Appendix  C  for  more  details  on 
epipolar  transformations). 

Proof:  Let  pj  j  =  1,2,3,  be  three  arbitrary 

corresponding  points,  and  let  Vj  and  14  denote  the  left 
and  right  epipoles.  First  note  that  the  four  points  pj  and 
14  are  projected  from  four  coplanar  points  in  the  scene. 
The  reason  is  that  the  plane  defined  by  the  three  object 
points  Pj  intersects  the  line  V1V2  connecting  the  two 
centers  of  projection,  at  a  point  —  regular  or  ideal.  That 
point  projects  onto  both  epipoles.  The  transformation 
A,  therefore,  is  a  projective  transformation  of  the  plane 
passing  through  the  three  object  points  Pi,  P2.  Pa-  Note 
that  A  is  uniquely  determined  provided  that  no  three  of 
the  four  points  are  collinear. 

Let  pp'  =  Ap  for  some  arbitrary  point  p.  Because  lines 
are  projective  invariants,  any  point  along  the  epipolar 
line  pl4  must  project  onto  the  epipolar  line  p'V).  Hence, 
A  is  an  induced  epipolar  transformation.  Q 

Given  the  epipoles,  therefore,  we  need  just  three  points 
to  determine  the  correspondences  of  all  other  points 
coplanar  with  the  reference  plane  passing  through  the 
three  corresponding  object  points.  The  transformation 
(collineation)  A  is  determined  from  the  following  equa¬ 
tions: 

Apj=Pjp'j,  j  =  1,2,3 
AVr=pV,, 

where  p,pj  are  unknown  scalars,  and  A3, 3  =  1.  One 
can  eliminate  p,  pj  from  the  equations  and  solve  for  the 
matrix  A  from  the  three  corresponding  points  and  the 
corresponding  epipoles.  That  leads  to  a  linear  system 
of  eight  equations,  and  is  described  in  more  detail  in 
Appendix  A. 

If  Pi,  P2,  Pz  define  the  first  reference  plane,  the  trans¬ 
formation  A  determines  the  location  of  ff  for  all  other 
points  p  (p'  and  p'  coincide  if  P  is  coplanar  with  the  first 
reference  plane).  In  other  words,  we  have  that  p  =  Ap. 
Note  that  p'  is  not  necessarily  a  point  on  the  second  im¬ 
age  plane,  but  it  is  on  the  line  V2P.  We  can  determine 
its  location  on  the  second  plane  by  normalizing  Ap  such 
that  its  third  component  is  set  to  1. 

Similarly,  let  P2,P3,Pa  define  the  second  reference 
plane  (assuming  the  four  object  points  Pj,  j  =  1,...,4, 
are  non-coplanar).  The  transformation  E  is  uniquely 
determined  by  the  equations 

PPi=PjPj,  i  =  2,3,4 
EVr  =  pVi, 

and  determines  all  other  correspondences  induced  by  the 
second  reference  plane  (we  assume  that  no  three  of  the 
four  points  used  to  determine  E  are  collinear).  In  other 
words,  Ep  determines  the  location  of  ^  up  to  a  scale 
factor  along  the  ray  V^P. 

Instead  of  normalizing  Ap  and  Ep  we  compute  ap 
from  the  cross-ratio  of  the  points  represented  in  homo¬ 
geneous  coordinates,  i.e.,  the  cross-ratio  of  the  four  rays 
Vip' ,V2ff ,V2p ,V2Vi,  as  follows:  Let  the  rays  p',V\  be 
represented  as  a  linear  combination  of  the  rays  p  =  Ap 
and  p  —  Ep,  i.e., 

p'=P  +  kp 
V,=p  +  k'p. 


then  Op  =  p  (see  Appendix  B  for  more  details).  This 
way  of  computing  the  cross-ratio  is  preferred  over  the 
more  familiar  cross-ratio  of  four  collinear  points,  because 
it  enables  us  to  work  with  all  elements  of  the  projective 
plane,  including  ideal  points  (a  situation  that  arises,  for 
instance,  when  epipolar  lines  are  parallel,  and  in  general 
under  parallel  projection). 

We  have  therefore  shown  the  following  result: 

Theorem  1  In  the  case  where  the  location  of  epipoles 
are  known,  then  four  corresponding  points,  coming  from 
four  non-coplanar  points  in  space,  are  sufficient  for  com¬ 
puting  the  projective  structure  invariant  Op  for  all  other 
points  in  space  projecting  onto  corresponding  points  in 
both  views,  for  all  central  projections,  including  parallel 
projection. 

This  result  shows  that  the  difference  between  paredlel 
and  central  projection  lies  entirely  on  the  epipoles.  In 
both  cases  four  non-coplcuieu  points  are  sufficient  for  ob- 
teuning  the  invariant,  but  in  the  parallel  projection  case 
we  have  prior  kno  vledge  that  both  epipoles  are  ideal, 
therefore  they  are  not  required  for  determining  the  trans¬ 
formations  A  and  E  (in  other  words,  A  and  E  are  affine 
transformations,  more  on  that  in  Section  7.2). 

Another  point  to  note  with  this  result  is  that  the 
minimal  number  of  corresponding  points  needed  for  re¬ 
projection  is  smaller  than  the  previously  reported  num¬ 
ber  (Faugeras  1992,  Mohr  et  al.  1992)  for  recovering 
the  projective  coordinates  of  object  points.  Faugeras 
shows  that  five  corresponding  points  coming  from  five 
points  in  general  position  (i.e.,  no  four  of  them  are  copla¬ 
nar)  can  be  used,  together  with  the  epipoles,  to  recover 
the  projective  coordinates  of  all  other  points  in  space. 
Because  the  projective  structure  invarieint  requires  only 
four  points,  this  implies  that  re-projection  is  done  more 
directly  than  through  full  reconstruction  of  projective 
coordinates,  and  therefore  is  likely  to  be  more  stable. 

We  next  discuss  algorithms  for  recovering  the  loca^ 
tion  of  epipoles.  The  problem  of  recovering  the  epipoles 
is  well  known  and  several  approaches  have  been  sug¬ 
gested  in  the  past  (Longuet-Higgins  and  Prazdny  1980, 
Rieger-Lawton  1985,  Faugeras  and  Maybank  1990,  Hil¬ 
dreth  1991,  Faugeras  1992,  Faugeras,  Luong  and  May- 
bank  1992).  We  start  with  a  method  that  requires  six 
corresponding  points  (two  additional  points  to  the  four 
we  already  have).  The  method  is  a  direct  extension  of  the 
Koenderink  and  Van  Doom  (1991)  construction  in  par¬ 
allel  projection,  and  was  described  earlier  by  Lee  (1988) 
for  the  purpose  of  recovering  the  transiation^ll  compo¬ 
nent  of  camera  motion. 

The  second  algorithm  for  locating  the  epipoles  is 
adopted  from  Faugeras  (1992)  and  is  based  on  the  fun¬ 
damental  matrix  of  Longuet-Higgins  (1981). 

7  Epipoles  from  Six  Points 

We  can  recover  the  correspondences  induced  from  the 
first  reference  plane  by  selecting  four  corresponding 
points,  assuming  they  are  projected  from  four  coplanar 
object  points.  Let  pj  =  {xj,yj,l)  and  p'  =  (ij,y<,l) 
and  j  =  1, ...,  4  represent  the  standard  image  coordinates 
of  the  four  corresponding  points,  no  three  of  which  are 


Figure  4:  The  geometry  of  locating  the  left  epipole  using 
two  points  out  of  the  reference  plane. 

collinear,  in  both  projections.  Therefore,  the  transfor¬ 
mation  A  is  uniquely  determined  by  the  following  equa¬ 
tions, 

PjP'j  =  Apj. 

Let  p  =  Ap  be  the  homogeneous  coordinate  representa^ 
tion  of  the  ray  V^P,  and  let  p~^  =  A~^p' . 

Having  accounted  for  the  motion  of  the  reference 
plcuie,  we  can  easily  find  the  location  of  the  epipoles  (in 
stcindard  coordinates).  Given  two  object  points  Ps.fe 
that  are  not  on  the  reference  plane,  we  can  find  both 
epipoles  by  observing  that  p  is  on  the  left  epipolar 
line,  and  similarly  that  p“'  is  on  the  right  epipolar  line. 
Stated  formally,  we  have  the  following  proposition: 

Proposition  2  The  left  epipole,  denoted  byVi,  is  at  the 
intersection  of  the  line  p^p^  and  the  line  p^p^.  Similarly, 
the  right  epipole.  denoted  by  Vr,  is  at  the  intersection  of 
PsPs'  andpep^K 

Proof:  It  is  sufficient  to  prove  the  claim  for  one  of  the 
epipoles,  say  the  left  epipole.  Consider  Figure  4  which 
describes  the  construction  geometrically.  By  construc¬ 
tion,  the  line  P5P5V1  projects  to  the  line  pj^  via  V2 
(points  and  lines  are  projective  invariants)  and  therefore 
they  are  coplanar.  In  particular,  Vi  projects  to  Vj  which 
is  located  at  the  intersection  of  p'^^  and  V1V2.  Simi¬ 
larly,  the  line  pgp^  intersects  Vi  V2  at  V) .  Finally,  Vi  and 
Vi  must  coincide  because  the  two  lines  p'^p  and  p'^Pe 
coplanar  (both  are  on  the  image  plane).  Q 

Algebraically,  we  can  recover  the  ray  V1V2,  or  Vi  up  to 
a  scale  factor,  using  the  following  formula: 

yi  =  (p's  X  Ps)  X  (Ps  X  Ps). 

Note  that  V)  is  defined  with  respect  to  the  standard  coor¬ 
dinate  frame  of  the  second  camera.  We  treat  the  epipole 
Vi  as  the  ray  ViVa  with  respect  to  Vb,  and  the  epipole 
Vr  as  the  same  ray  but  with  respect  to  V).  Note  also 
that  the  third  component  of  V)  is  zero  if  epipolar  lines 
are  parallel,  i.e.,  Vi  is  an  ideal  point  in  projective  terms 
(happening  under  parallel  projection,  or  when  the  non- 
rigid  camera  motion  brin^  the  image  plane  to  a  position 
where  it  is  parallel  to  the  line  V1V2). 


In  the  case  where  more  than  two  epipolar  lines  are 
available  (such  as  when  more  than  six  corresponding 
points  are  available),  one  can  find  a  least-squares  so¬ 
lution  for  the  epipole  by  using  a  principle  component 
analysis,  as  follows.  Let  B  be  a  fc  x  3  matrix,  where 
each  row  represents  an  epipolar  line.  The  least  squares 
solution  to  Vi  is  the  unit  eigenvector  associated  with  the 
smallest  eigenumber  of  the  3x3  matrix  B'B.  Note  that 
this  can  be  done  analytically  because  the  characteristic 
equation  is  a  cubic  polynomial. 

Altogether,  we  have  a  six  point  algorithm  for  recover¬ 
ing  both  the  epipoles,  and  the  projective  structure  Op, 
and  for  performing  re-projection  onto  any  novel  view. 
We  summarize  in  the  following  section  the  6-point  eilgo- 
rithm. 

7.1  Re-projection  Using  Projective  Structure: 

6-point  Algorithm 

We  assume  we  are  given  two  model  views  of  a  3D  object, 
and  that  all  points  of  interest  are  in  correspondciice.  We 
assume  these  correspondences  can  be  based  on  me^lsures 
of  correlation,  as  used  in  optical-flow  methods  (see  also 
Shashua  1991,  Bachelder  &  Ullman  1992  for  methods  for 
extracting  correspondences  using  combination  of  optical 
flow  and  affine  geometry). 

Given  a  novel  view  we  extract  six  corresponding  points 
(with  one  of  the  model  views):  pj  < — ►  p'  < — ►  p-, 
j  =  1, ...,  6.  We  assume  the  first  four  points  are  projected 
from  four  coplanru:  points,  and  the  other  corresponding 
points  are  projected  from  points  that  are  not  on  the  ref¬ 
erence  plane.  Without  loss  of  generality,  we  assume  the 
standard  coordinate  representation  of  the  image  planes, 
i.e.,  the  image  coordinates  are  embedded  in  a  3D  vec¬ 
tor  whose  third  component  is  set  to  1  (see  Appendix  A). 
The  computations  for  recovering  projective  shape  and 
performing  re-projection  are  described  below. 

1:  Recover  the  transformation  A  that  satisfies  pjp'^  = 
Apj,  j  =  1,...,4.  This  requires  setting  up  a  linear 
system  of  eight  equations  (see  Appendix  A).  Apply 
the  treuisformation  to  all  points  p,  der.  oting  p  =  Ap. 
Also  recover  the  epipoles  Vi  =  (ps  x  pg)  x  (pg  x  pg) 
and  Vr  =  (ps  x  A-^p^)  x  (pg  x  A-^p^). 

2:  Recover  the  transformation  E  that  satisfies  pV)  = 
EVr  and  pyp'  =  Epj,j  =  4,5,6. 

3:  Compute  the  cross-ratio  of  the  points  p',  Ap,  Ep,  Vj, 
for  all  points  p  and  denote  that  by  Op  (see  Ap¬ 
pendix  B  for  details  on  computing  the  cross-ratio 
of  four  rays). 

4:  Perform  step  1  between  the  first  and  novel  view: 
recover  A  that  satisfies  pjp-  =  Apj,  j  =  1,...,4, 
apply  A  to  all  points  p  and  denote  that  by  p'  =  Ap, 
recover  the  epipoles  V/„  =  (pg  x  p^)  x  (pg  x  pl)  and 
Vrn  =  (P5  X  A-‘p|j')  X  (P6  X  A'^p'g'). 

5:  Perform  step  2  between  the  first  and  novel  view: 
Recover  the  transformation  E  that  satisfies  pVi„  = 
EVrn  and  pjpj  =  Epj,  j  =  4,5,6. 

6:  For  every  point  p,  recover  p"  from  the  cross-ratio  Op 
and  the  three  rays  Ap,  Ep,  V/n .  Normalize  p"  such 
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that  its  third  coordinate  is  set  to  1. 

The  entire  procedure  requires  setting  up  a  linear  sys¬ 
tem  of  eight  equations  four  times  (Step  1,2, 4, 5)  and  com¬ 
puting  cross-ratios  (linear  operations  as  well). 

We  discuss  below  an  important  property  of  this  pro¬ 
cedure  which  is  the  transparency  with  respect  to  projec¬ 
tion  model:  central  and  parallel  projection  are  treated 
alike  —  a  property  which  has  implications  on  stability 
of  re-projection  no  matter  what  degree  of  perspective 
distortions  are  present  in  the  images. 

7.2  The  Case  of  Parallel  Projection 

The  construction  for  obtaining  projective  structure  is 
well  defined  for  all  central  projections,  including  the  case 
where  the  center  of  projection  is  an  ideal  point,  i.e.,  such 
as  happening  with  psirallel  projection.  The  construction 
has  two  components:  the  first  component  has  to  do  with 
recovering  the  epipolar  geometry  via  reference  planes, 
and  the  second  component  is  the  projective  invariant  Op. 

From  Proposition  1  the  projective  transformations  A 
and  E  can  be  uniquely  determined  from  three  corre¬ 
sponding  points  and  the  corresponding  epipoles.  If  both 
epipoles  are  ideal,  the  transformations  become  affine 
transformations  of  the  plane  (an  affine  transformation 
separates  ideal  points  from  Euclidean  points).  All  other 
possibilities  (both  epipoles  are  Euclidean,  one  epipole 
Euclidean  and  the  other  epipole  ideal)  lead  to  projective 
transformations.  Because  a  projectivity  of  the  projec¬ 
tive  plane  is  uniquely  determined  from  any  four  points 
on  the  projective  plane  (provided  no  three  are  collinear), 
the  transformations  A  and  E  are  uniquely  determined 
under  all  situations  of  central  projection  —  including 
parallel  projection. 

The  projective  invariant  Qp  is  the  same  as  the  one 
defined  under  parallel  projection  (Section  5)  —  affine 
structure  is  a  particular  instance  of  projective  structure 
in  which  the  epipole  Vi  is  an  ideal  point.  By  using  the 
same  invariant  for  both  parallel  and  central  projection, 
and  because  all  other  elements  of  the  geometric  construc¬ 
tion  hold  for  both  projection  models,  the  overall  system 
is  transparent  to  the  projection  model  being  used. 

The  first  implication  of  this  property  has  to  do  with 
stability.  Projective  structure  does  not  require  any  per¬ 
spective  distortions,  therefore  all  imaging  situations  can 
be  handled  —  wide  or  narrow  field  of  views.  The  second 
implication  is  that  3D  visual  recognition  from  2D  images 
can  be  achieved  in  a  uniform  manner  with  regard  to  the 
projection  model.  For  instance,  we  can  recognize  (via  re¬ 
projection)  a  perspective  image  of  an  object  from  only 
two  orthographic  model  images,  and  in  general  any  com¬ 
bination  of  perspective  and  orthographic  images  serving 
as  model  or  novel  views  is  allowed. 

The  results  so  far  required  prior  knowledge  (or  as¬ 
sumption)  that  four  of  the  corresponding  points  are  com¬ 
ing  from  coplanar  points  in  space.  This  requirement  can 
be  avoided,  using  two  more  corresponding  points  (mak¬ 
ing  eight  points  overall),  and  is  described  in  the  next 
section. 


8  Epipoles  from  Eight  Points 

We  adopt  a  recent  algorithm  suggested  by  Faugeras 
(1992)  which  is  based  on  Longuet-Higgins’  ( 1981)  funda¬ 
mental  matrix.  The  method  is  very  simple  and  requires 
eight  corresponding  points  for  recovering  the  epipoles. 

Let  f  be  an  epipolar  transformation,  i.e.,  FI  =  fil', 
where  I  =  Vr  x  p  and  I'  =  Vi  x  p'  are  corresponding 
epipolar  lines.  We  can  rewrite  the  projective  relation  of 
epipolar  lines  using  the  matrix  form  of  cross-products: 

F{Vr  X  p)  =  F[Vr]p  =  pi', 

where  [14]  is  a  skew  symmetric  matrix  (and  hence  h2is 
rank  2).  From  the  point/line  incidence  property  we  have 
that  p'  •/'  =  0  and  therefore,  F^*F’[V4]p  =  0,  or  p'‘  Hp  =  0 
where  H  =  F’fK-].  The  matrix  H  is  known  as  the  fun¬ 
damental  matrix  introduced  by  Longuet-Higgins  (1981), 
and  is  of  rank  2.  One  can  recover  H  (up  to  a  scale  factor) 
directly  from  eight  corresponding  points,  or  by  using  a 
principle  components  approach  if  more  than  eight  points 
are  available.  Finally,  it  is  easy  to  see  that 

HK  =  0, 

and  therefore  the  epipole  14  can  be  uniquely  recovered 
(up  to  a  scale  factor).  Note  that  the  determinant  of 
the  first  principle  minor  of  H  vanishes  in  the  case  where 
14  is  an  ideal  point,  i.e.,  /»ii/»22  —  /»i2^2i  =  0.  In  that 
case,  the  x,y  components  of  14  can  be  recovered  (up  to 
a  scale  factor)  from  the  third  row  of  H.  The  epipoles, 
therefore,  can  be  uniquely  recovered  under  both  central 
and  parallel  projection.  We  have  arrived  at  the  following 
theorem: 

Theorem  2  In  the  case  where  we  have  eight  correspond¬ 
ing  points  of  two  views  taken  under  central  projection 
(including  parallel  projection),  four  of  these  points,  com¬ 
ing  from  four  non-coplanar  points  in  space,  are  suffi¬ 
cient  for  computing  the  projective  structure  invariant  Qp 
for  the  remaining  four  points  and  for  all  other  points  tn 
space  projecting  onto  corresponding  points  in  both  views. 

We  summ2U‘ize  in  the  following  section  the  8-point 
scheme  for  reconstructing  projective  structure  and  per¬ 
forming  re-projection  onto  a  novel  view. 

8.1  8-point  Re-projection  Algorithm 

We  assume  we  have  eight  corresponding  points  between 
two  model  views  and  the  novel  view,  pj  ^ — ►  p'  < — ►  p", 
j  =  1, ...,  8,  and  that  the  first  four  points  are  coming  from 
four  non-coplanar  points  in  space.  The  computations 
for  recovering  projective  structure  and  performing  re¬ 
projection  are  described  below. 

1:  Recover  the  fundamental  matrix  H  (up  to  a  scale 
factor)  that  satisfies  Pj*Hpj,  j  =  1,...,8.  The  right 
epipole  Vr  then  satisfies  HVr  =  0.  Similarly,  the 
left  epipole  is  recovered  from  the  relation  p*Hpl  and 
iiVi  =  0. 

2:  Recover  the  transformation  A  that  satisfies  pV)  = 
AVr  and  p^pj  =  Apj,  j  =  1,2,3.  Similarly,  recover 
the  transformation  E  that  satisfies  pV)  =  £14  and 
PiP'j  =  Epj,  j  =  2,3,4. 


3:  Compute  Op  a&  the  cross-ratio  of  pf,Ap,Ep,Vi,  for 
all  points  p. 

4:  Perform  step  1  and  2  between  the  first  and  novel 
view:  recover  the  epipoles  Vrn,Vin,  and  the  trans¬ 
formations  A  and  E. 

5:  For  every  point  p,  recover  p"  from  the  cross-ratio  Op 
and  the  three  rays  Ap,  Ep,Vin-  Normalize  p"  such 
that  its  third  coordinate  is  set  to  1. 

We  discuss  next  the  possibility  of  working  with  a  rigid 
camera  (i.e.,  perspective  projection  and  calibrated  cam¬ 
era). 


9  The  Rigid  Camera  Case 


The  advantage  of  the  non-rigid  camera  model  (or  the 
central  projection  model)  used  so  far  is  that  images  can 
be  obtained  from  uncalibrated  cameras.  The  price  paid 
for  this  property  is  that  the  images  that  produce  the 
same  projective  structure  invariant  (equivalence  class  of 
images  of  the  object)  can  be  produced  by  applying  non- 
rigid  transformations  of  the  object,  in  addition  to  rigid 
transformations. 

In  this  section  we  show  that  it  is  possible  to  verify 
whether  the  images  were  produced  by  rigid  transfor¬ 
mations,  which  is  equivalent  to  working  with  perspec¬ 
tive  projection  assuming  the  cameras  are  internally  cal¬ 
ibrated.  This  can  be  done  for  both  schemes  presented 
above,  i.e.,  the  6-point  and  8-point  algorithms.  In  both 
cases  we  exclude  orthographic  projection  and  assume 
only  perspective  projection. 

In  the  perspective  case,  the  second  reference  plane  is 
the  image  plane  of  the  first  model  view,  and  the  trans¬ 
formation  for  projecting  the  second  reference  plane  onto 
any  other  view  is  the  rotational  component  of  camera 
motion  (rigid  transformation).  We  recover  the  rota¬ 
tional  component  of  camera  motion  by  adopting  a  re¬ 
sult  derived  by  Lee  (1988),  who  shows  that  the  rota¬ 
tional  component  of  motion  can  be  uniquely  determined 
from  two  corresponding  points  and  the  corresponding 
epipoles.  We  then  show  that  projective  structure  can  be 
uniquely  determined,  up  to  a  uniform  scale  factor,  from 
two  calibrated  perspective  images. 


Proposition  3  (Lee,  1988)  In  the  case  of  perspective 
projection,  the  rotational  component  of  camera  motion 
can  be  uniquely  recovered,  up  to  a  reflection,  from  two 
corresponding  points  and  the  corresponding  epipoles. 
The  reflection  component  can  also  be  uniquely  deter¬ 
mined  by  using  a  third  corresponding  point. 


Proof:  Let  =  p'  x  V)  and  Ij  =  pj  x  Vr,  j  =  1,2 
be  two  corresponding  epipolar  lines.  Because  A  is  an  or¬ 
thogonal  matrix,  it  leaves  vector  magnitudes  unchanged, 
and  we  can  normalize  the  length  of  l'i,ly,  V)  to  be  of  the 
same  length  of  /i ,  /a,  Vr,  respectively.  We  have  therefore, 
/J  =  Rlj,  j  =  1,2,  and  VJ  =  RVr,  which  is  sufficient  for 
determining  A  up  to  a  reflection.  Note  that  because  A 
is  a  rigid  transformation,  it  is  both  an  epipolar  and  an 
induced  epipolar  transformation  (the  induced  transfor¬ 
mation  E  is  determined  by  E  =  (A~*)‘,  therefore  E  =  R 
because  A  is  an  orthogonal  matrix). 


P 


Figure  5:  Illustration  that  projective  shape  can  be  re¬ 
covered  only  up  to  a  uniform  scale  (see  text). 


To  determine  the  reflection  component,  it  is  sufficient 
to  observe  a  third  corresponding  point  pz  < — ►  p'^.  The 
object  point  P3  is  along  the  ray  Vipz  and  therefore  has 
the  coordinates  azpz  (w.r.t.  the  first  camera  coordinate 
frame),  and  is  also  along  the  ray  V2P3  and  therefore  has 
the  coordinates  0^3  (w.r.t.  the  second  camera  coordi¬ 
nate  frame).  We  note  that  the  ratio  between  03  and 
03  is  a  positive  number.  The  change  of  coordinates  is 
represented  by: 

0Vr  +  azRpz  =  a'sp'z, 

where  0  is  an  unknown  constant.  If  we  multiply  both 
sides  of  the  equation  by  /'•,  j  =  1,2,3,  the  term  0Vr 
drops  out,  because  K  is  incident  to  all  left  epipolar  lines, 
and  after  substituting  Ij  with  /j  A,  we  are  left  with, 

<^alj  p3  =  a'3l'l  p'3, 

which  is  sufficient  for  determining  the  sign  of  /' .  Q 
The  rotation  matrix  A  can  be  uniquely  recovered  from 
any  three  corresponding  points  and  the  corresponding 
epipoles.  Projective  structure  can  be  reconstructed  by 
replacing  the  transformation  E  of  the  second  reference 
plane,  with  the  rigid  transformation  A  (which  is  equiv- 
2dent  to  treating  the  first  image  plane  as  a  reference 
plane).  We  show  next  that  this  can  lead  to  projective 
structure  up  to  an  unknown  uniform  scale  factor  (unlike 
the  non-rigid  camera  case). 

Proposition  4  In  the  perspective  case,  the  projective 
shape  constant  Op  can  be  determined,  from  two  views, 
at  most  up  to  a  uniform  scale  factor. 

Proof:  Consider  Figure  5,  and  let  the  effective  trans¬ 
lation  be  V2  —  V,  =  it(V2  —  i^i)>  which  is  the  true  trans¬ 
lation  scaled  by  an  unknown  factor  k.  Projective  shape. 
Op,  remains  fixed  if  the  scene  and  the  focal  length  of  the 
first  view  are  scaled  by  k:  from  similarity  of  triangles  we 
have, 

.  _  V.  -  Vz  P.  -  V,  f. 

Vy-V2  p-Vi  1 

P.  -V.  P,-  V2 

A  -  Vi  A  -  V2 
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Figure  6:  The  basic  object  configuration  for  the  experi¬ 
mental  set-up. 

where  f,  is  the  scaled  focal  length  of  the  first  view.  Since 
the  magnitude  of  the  translation  along  the  line  Vi  V2  is 
irrecoverable,  we  can  assume  it  is  null,  and  compute  Op 
as  the  cross-ratio  of  p',  Ap,  Rp,  V/  which  determines  pro¬ 
jective  structure  up  to  a  uniform  scale.  Q 

Because  Op  is  determined  up  to  a  uniform  scale,  we 
need  an  additionad  point  in  order  to  establish  a  common 
scale  during  the  process  of  re-projection  (we  can  use  one 
of  the  existing  six  or  eight  points  we  already  have).  We 
obtain,  therefore,  the  following  result; 

Theorem  3  In  the  perspective  case,  a  rigid  re- 
projeciton  from  two  model  views  onto  a  novel  view  is  pos¬ 
sible,  using  four  corresponding  points  coming  from  four 
non-coplanar  points,  and  the  corresponding  epipoles. 
The  projective  structure  computed  from  two  perspective 
images,  is  invariant  up  to  an  overall  scale  factor. 

Orthographic  projection  is  excluded  from  this  result 
because  it  is  well  known  that  the  rotational  component 
cannot  be  uniquely  determined  from  two  orthographic 
views  (Ullman  1979,  Huang  and  Lee  1989,  Aloimonos 
and  Brown  1989).  To  see  what  happens  in  the  case  of 
parallel  projection  note  that  the  epipoles  are  vectors  on 
the  xy  plane  of  their  coordinate  systems  (ideal  points), 
and  the  epipolar  lines  are  two  vectors  perpendicular  to 
the  epipole  vectors.  The  equation  RVr  =  Vi  takes  care 
of  the  rotation  in  plane  (around  the  optical  axis).  The 
other  two  equations  Rlj  =  Ij,  j  =  1,2,  take  care  only 
of  rotation  around  the  epipolar  direction  —  rotation 
around  an  eixis  perpendicular  to  the  epipolar  direction 
is  not  accounted  for.  The  equations  for  solving  for  R 
provide  a  non-singular  system  of  equations  but  do  pro¬ 
duce  a  rotation  matrix  with  no  rotational  components 
around  an  axis  perpendicular  to  the  epipolar  direction. 

10  Simulation  Results  Using  Synthetic 
Objects 

We  ran  simulations  using  synthetic  objects  to  illustrate 
the  re-projection  process  using  the  6-point  scheme  under 
various  imaging  situations.  We  also  tested  the  robust¬ 
ness  of  the  re-projection  method  under  various  types  of 
noise.  Because  the  6-point  scheme  requires  that  four  of 


the  corresponding  points  be  projected  from  four  copla- 
nar  points  in  space,  it  is  of  special  interest  to  see  how 
the  method  behaves  under  conditions  that  violate  this 
assumption,  and  under  noise  conditions  in  general.  The 
stability  of  the  8-point  algorithm  largely  depends  on  the 
method  for  recovering  the  epipoles.  The  method  adopted 
from  Faugeras  (1992),  described  in  Section  8,  based  on 
the  fundament^  matrix,  tends  to  be  very  sensitive  to 
noise  if  the  minimal  number  of  points  (eight  points)  are 
used.  We  have,  therefore,  focused  the  experimental  error 
analysis  on  the  6-point  scheme. 

Figure  6  illustrates  the  experimental  set-up.  The  ob¬ 
ject  consists  of  26  points  in  space  arranged  in  the  follow¬ 
ing  manner:  14  points  are  on  a  plane  (reference  plane) 
ortho-parallel  to  the  image  plane,  and  12  points  are  out 
of  the  reference  plane.  The  reference  plane  is  located 
two  focal  lengths  away  from  the  center  of  projection  (fo¬ 
cal  length  is  set  to  50  units).  The  depth  of  out-of-plane 
points  varies  randomly  between  10  to  25  units  away  from 
the  reference  plane.  Th.  x,y  coordinates  of  all  points, 
except  the  points  Pi,...,Pg,  vary  randomly  between  0 
—  240.  The  ‘privileged’  points  P\,...,Pg  have  x,y  co¬ 
ordinates  that  place  these  points  all  around  the  object 
(clustering  privileged  points  together  will  inevitably  con¬ 
tribute  to  instability). 

The  first  view  is  simply  a  perspective  projection  of  the 
object.  The  second  view  is  a  result  of  rotating  the  object 
around  the  point  (128, 128, 100)  with  an  axis  of  rotation 
described  by  the  unit  vector  (0.14,0.7,0.7)  by  an  an¬ 
gle  of  29  degrees,  followed  by  a  perspective  projection 
(note  that  rotation  about  a  point  in  space  is  equivalent 
to  rotation  about  the  center  of  projection  followed  by 
translation).  The  third  (novel)  view  is  constructed  in  a 
similar  manner  with  a  rotation  around  the  unit  vector 
(0.7,0.7,0.14)  by  an  angle  of  17  degrees.  Figure  7  (first 
row)  displays  the  three  views.  Also  in  Figure  7  (second 
row)  we  show  the  result  of  applying  the  transformation 
due  to  the  four  coplanar  points  pi ,  ...,p4  (Step  1,  see  Sec¬ 
tion  7.1)  to  all  points  in  the  first  view.  We  see  that  all 
the  copi2inar  points  are  aligned  with  their  correspond¬ 
ing  points  in  the  second  view,  and  all  other  points  are 
situated  along  epipolar  lines.  The  display  on  the  right 
in  the  second  row  shows  the  final  re-projection  result  (8- 
point  and  6-point  methods  produce  the  same  result).  All 
points  re-projected  from  the  two  model  views  are  accu¬ 
rately  (noise-free  experiment)  aligned  with  their  corre¬ 
sponding  points  in  the  novel  view. 

The  third  row  of  Figure  7  illustrates  a  more  challeng¬ 
ing  imaging  situation  (still  noise-free).  The  second  view 
is  orthographically  projected  (and  scaled  by  0.5)  follow¬ 
ing  the  same  rotation  and  translation  as  before,  and  the 
novel  view  is  a  result  of  a  central  projection  onto  a  tilted 
image  plane  (rotated  by  12  degrees  around  a  coplanar 
axis  parallel  to  the  x-axis).  We  have  therefore  the  situ¬ 
ation  of  recognizing  a  non-rigid  perspective  projection 
from  a  novel  viewing  position,  given  a  rigid  perspec¬ 
tive  projection  and  a  rigid  orthographic  projection  from 
two  model  viewing  positions.  The  ^pcint  re-projection 
scheme  was  applied  with  the  result  that  all  re-projected 
points  are  in  accurate  alignment  with  their  correspond¬ 
ing  points  in  the  novel  view.  Identic*!  results  were  ob- 


Figure  7:  Illustration  of  Re-projection.  Row  1  (left  to  right):  Three  views  of  the  object,  two  model  views  and  a 
novel  view,  constructed  by  rigid  motion  following  perspective  projection.  The  filled  dots  represent  pi,  ...,p4  (coplanar 
points).  Row  2:  Overlay  of  the  second  view  and  the  first  view  following  the  transformation  due  to  the  reference 
plane  (Step  1,  Section  7.1).  All  coplanar  points  are  aligned  with  their  corresponding  points,  the  remaining  points  are 
situated  along  epipolar  lines.  The  righthand  display  is  the  result  of  re-projection  —  the  re-projected  image  perfectly 
matches  the  novel  image  (noise-free  situation).  Row  3:  The  lefthand  display  shows  the  second  view  which  is  now 
orthographic.  The  middle  display  shows  the  third  view  which  is  now  a  perspective  projection  onto  a  tilted  image 
plane.  The  righthand  display  is  the  result  of  re-projection  which  perfectly  matches  the  novel  view. 
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served  with  the  8-point  algorithms. 

The  remaining  experiments,  discussed  in  the  follow¬ 
ing  sections,  were  done  under  various  noise  conditions. 
We  conducted  three  types  of  experiments.  The  first  ex¬ 
periment  tested  the  stability  under  the  situation  where 
Pi,...,P4  are  non-coplanar  object  points.  The  second 
experiment  tested  stability  under  random  noise  added 
to  ail  image  points  in  all  views,  and  the  third  experi¬ 
ment  tested  stability  under  the  situation  that  less  noise 
is  added  to  the  privileged  six  points,  than  to  other  points. 

10.1  Testing  Deviation  from  Coplanarity 

In  this  experiment  we  investigated  the  effect  of  translat¬ 
ing  Pi  along  the  optical  axis  (of  the  first  camera  position) 
from  its  initial  position  on  the  reference  plane  (z  =  100) 
to  the  farthest  depth  position  (z  =  125),  in  increments 
of  one  unit  at  a  time.  The  experiment  was  conducted  us¬ 
ing  several  objects  of  the  type  described  above  (the  six 
privileged  points  were  fixed,  the  remaining  points  were 
ctssigned  random  positions  in  space  in  different  trials), 
undergoing  the  same  motion  described  above  (as  in  Fig¬ 
ure  7,  first  row).  The  effect  of  depth  translation  to  the 
level  z  =  125  on  the  location  of  pi  is  a  shift  of  0.93  pix¬ 
els,  on  p'l  is  1.58  pixels,  and  on  the  location  of  p"  is  3.26 
pixels.  Depth  translation  is  therefore  equivalent  to  per- 
rturbing  the  location  of  the  projections  of  Pi  by  various 
degrees  (depending  on  the  3D  motion  parameters). 

Figure  8  shows  the  average  pixel  error  in  re-projection 
over  the  entire  range  of  depth  translation.  The  average 
pixel  error  was  measured  as  the  average  of  deviations 
from  the  re-projected  point  to  the  actual  location  of  the 
corresponding  point  in  the  novel  view,  taken  over  all 
points.  Figure  8  also  displays  the  result  of  re-projection 
for  the  case  where  Pi  is  at  z  =  125.  The  average  error 
is  1.31,  and  the  maximal  error  (the  point  with  the  most 
deviation)  is  7.1  pixels.  The  alignment  between  the  re¬ 
projected  image  and  the  novel  image  is,  for  the  most 
part,  fairly  accurate. 

10.2  Situation  of  Random  Noise  to  all  Image 
Locations 

We  next  add  random  noise  to  all  image  points  in  all 
three  views  (Pi  is  set  back  to  the  reference  plane).  This 
experiment  was  done  repeatedly  over  various  degrees  of 
noise  and  over  several  objects.  The  results  shown  here 
have  noise  between  0-1  pixels  randomly  added  to  the  x 
and  y  coordinates  separately.  The  maximal  perturbation 
is  therefore  \/2,  and  because  the  direction  of  perturba¬ 
tion  is  random,  the  maximal  error  in  relative  location  is 
double,  i.e.,  2.8  pixels.  Figure  9  shows  the  average  pixel 
errors  over  10  trials  (one  particular  object,  the  same  mo¬ 
tion  as  before).  The  average  error  fluctuates  around  1.6 
pixels.  Also  shown  is  the  result  of  re-projection  on  a  typ¬ 
ical  trial  with  average  error  of  1.05  pixels,  and  maximal 
error  of  5.41  pixels.  The  match  between  the  re-projected 
image  and  the  novel  image  is  relatively  good  considering 
the  amount  of  noise  added. 

10.3  Random  Noise  Case  2 

A  more  realistic  situation  occurs  when  the  magnitude  of 
noise  associated  with  the  privileged  six  points  is  much 


lower  than  the  noise  associated  with  other  points,  for 
the  reason  that  we  are  interested  in  tracking  points  of 
interest  that  are  often  associated  with  distinct  inten¬ 
sity  structure  (such  as  the  tip  of  the  eye  in  a  picture 
of  a  face).  Correlation  methods,  for  instance,  are  known 
to  perform  much  better  on  such  locations,  than  on  ar¬ 
eas  having  smooth  intensity  change,  or  areas  where  the 
change  in  intensity  is  one-dimensional.  We  therefore  ap¬ 
plied  a  level  of  0-0.3  perturbation  to  the  x  and  y  coor¬ 
dinates  of  the  six  points,  and  a  level  of  0-1  to  all  other 
points  (as  before).  The  results  are  shown  in  Figure  10. 
The  average  pixel  error  over  10  trials  fluctuates  around 
0.5  pixels,  and  the  re-projection  shown  for  a  typical  trial 
(average  error  0.52,  maximal  error  1.61)  is  in  relatively 
good  correspondence  with  the  novel  view.  With  larger 
perturbations  at  a  range  of  0-2,  the  eilgorithm  behaves 
proportionally  well,  i.e.,  the  average  error  over  10  trials 
is  1.37. 

11  Summary 

In  this  paper  we  focused  on  the  problem  of  recovering 
relative,  non-metric,  structure  from  two  views  of  a  3D 
object.  Specifically,  the  invariant  structure  we  recover 
does  not  require  internal  camera  calibration,  does  not 
involve  full  reconstruction  of  shape  (Euclidean  or  pro¬ 
jective  coordinates),  and  treats  parallel  and  central  pro¬ 
jection  as  an  integral  part  of  one  unified  system.  We 
have  abo  shown  that  the  invariant  can  be  used  for  the 
purposes  of  visual  recognition,  within  the  framework  of 
the  alignment  approach  to  recognition. 

The  study  is  based  on  an  extension  of  Koenderink  and 
Van  Doom’s  representation  of  affine  structure  as  an  in¬ 
variant  defined  with  respect  to  a  reference  plane  and 
a  reference  point.  We  first  showed  that  the  KV  affine 
invariant  cannot  be  extended  directly  to  a  projective  in¬ 
variant  (Appendix  D),  but  there  exists  another  affine  in¬ 
variant,  described  with  respect  to  two  reference  planes, 
that  can  easily  be  extended  to  projective  space.  As  a 
result  we  obtained  the  projective  structure  invariant. 

We  have  shown  that  the  difference  between  the  affine 
and  projective  case  lie  entirely  in  the  location  of  epipoles, 
i.e.,  given  the  location  of  epipoles  both  the  affine  and 
projective  structure  are  constructed  from  the  same  infor¬ 
mation  captured  by  four  corresponding  points  projected 
from  four  non-coplanar  points  in  space.  Therefore,  the 
additional  corresponding  points  in  the  projective  case 
are  used  solely  for  recovering  the  location  of  epipoles. 

We  have  shown  that  the  location  of  epipoles  can  be 
recovered  under  both  parallel  and  central  projection  us¬ 
ing  six  corresponding  points,  with  the  assumption  that 
four  of  those  points  are  projected  from  four  coplanar 
points  in  space,  or  alternatively  by  having  eight  cor¬ 
responding  points  without  assumptions  on  coplanarity. 
The  overall  method  for  reconstructing  projective  struc¬ 
ture  and  achieving  re-projection  was  referred  to  as  the  6- 
point  and  the  8-point  algorithms.  These  algorithms  have 
the  unique  property  that  projective  structure  can  be  re¬ 
covered  from  both  orthographic  and  perspective  images 
from  uncalibrated  cameras.  This  property  implies,  for 
instance,  that  we  can  perform  recognition  of  a  perspec¬ 
tive  image  of  an  object  given  two  orthographic  images  as 
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Deviation  tram  Reference  Plane 


figure  8:  Deviation  from  coplanarity:  average  pixel  error  due  to  translation  of  Pi  along  the  optical  cixis  from  z  =  100 
to  z  =  125,  by  increments  of  one  unit.  The  result  of  re-projection  (overlay  of  re-projected  image  and  novel  image) 
for  the  case  z  =  125.  The  average  error  is  1.31  and  the  maximal  error  is  7.1. 


Figure  9:  Random  noise  added  to  all  image  points,  over  all  views,  for  10  trials.  Average  pixel  error  fluctuates  around 
1.6  pixels.  The  result  of  re-projection  on  a  typical  trial  with  average  error  of  1.05  pixels,  and  metximal  error  of  5.41 
pixels. 


a  model.  It  also  implies  greater  stability  because  the  size 
of  the  field  of  view  is  no  longer  an  issue  in  the  process  of 
reconstructing  shape  or  performing  re-projection. 
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A  Fundamental  Theorem  of  Plane 
Projectivity 

The  fundamental  theorem  of  plane  projectivity  states 
that  a  projective  transformation  of  the  plane  is  com¬ 
pletely  determined  by  four  corresponding  points.  We 
prove  the  theorem  by  first  using  a  geometric  drawing, 
and  then  algebraically  by  introducing  the  concept  of  rays 
(homogeneous  coordinates).  The  appendix  ends  with  the 
system  of  linear  equations  for  determining  the  correspon¬ 
dence  of  ail  points  in  the  plane,  given  four  corresponding 
points  (used  repeatedly  throughout  this  paper). 

Definitions:  A  penpeciivHy  between  two  planes  is 
defined  as  a  central  projection  from  one  plane  onto  the 
other.  A  projectivity  is  defined  as  made  out  of  a  finite 
sequence  of  perspectivities.  A  projectivity,  when  repre¬ 
sented  in  an  algebraic  form,  is  called  a  projective  trans¬ 
formation.  The  fundamental  theorem  states  that  a  pro- 


jectivity  is  completely  determined  by  four  corresponding 
points. 

Geometric  Illustration 

Consider  the  geometric  drawing  in  Figure  11.  Let 
A,  B,  C,  U  be  four  coplanar  points  in  the  scene,  and  let 
A' ,B' ,C' ,U'  be  their  projection  in  the  first  view,  and 
A",  B" ,  C" ,  U"  be  their  projection  in  the  second  view. 
By  construction,  the  two  views  2ue  projectively  related 
to  each  other.  We  further  assume  that  no  three  of  the 
points  are  collinear  (four  points  form  a  quadrangle),  and 
without  loss  of  generality  let  U  be  located  within  the 
triangle  ABC.  Let  BC  be  the  z-axis  and  BA  be  the 
y-axis.  The  projection  of  U  onto  the  z-axis,  denoted  by 
Ux,  is  the  intersection  of  the  line  AU  with  the  z-axis. 
Similarly  Uy  is  the  intersection  of  the  line  CU  with  the 
y-axis.  because  straight  lines  project  onto  straight  lines, 
we  have  that  t/*,  Uy  correspond  to  f/' ,  Uy  if  and  only  if  U 
corresponds  to  IP.  For  any  other  point  P,  coplanar  with 
ABCU  in  space,  its  coordinates  Px,Py  are  constructed 
in  a  similar  manner.  We  therefore  have  that  B,Ux,Px,C 
are  collinear  and  therefore  the  cross  ratio  must  be  equal 
to  the  cross  ratio  of  B' ,U^,  P^,C' ,  i.e. 

BC  ■  UxPx  B'C  ■  U'^P'^ 

BPx  -UxC  ~  B'P^  U'xC 

This  form  of  cross  ratio  is  known  as  the  canonical  cross 


Figure  10:  Random  noise  added  to  non-privileged  image  points,  over  all  views,  for  10  trials.  Average  pixel  error 
fluctuates  around  0.5  pixels.  The  result  of  re-projection  on  a  typical  trial  with  average  error  of  0.52  pixels,  and 
maximal  error  of  1.61  pixels. 


Figure  11;  The  geometry  underlying  plane  projectivity 
from  four  points. 

ratio.  In  general  there  are  24  cross  ratios,  six  of  which  are 
numerically  different  (see  Appendix  B  for  more  details 
on  cross-ratios).  Similarly,  the  cross  ratio  along  the  y- 
axis  of  the  reference  frame  is  equal  to  the  cross  ratio  of 
the  corresponding  points  in  both  views. 

Therefore,  for  any  point  p'  in  the  first  view,  we  con¬ 
struct  its  X  and  y  locations,  along  B'C  and  B'A', 

respectively.  From  the  equaUty  of  cross  ratios  we  find 
the  locations  of  p^',;^',  and  that  leads  to  p".  Because 
we  have  used  only  projective  constructions,  i.e.  straight 
lines  project  to  straight  lines,  we  are  guaranteed  that  p' 
and  p"  are  corresponding  points. 


Algebraic  Derivation 

From  an  algebraic  point  of  view  it  is  convenient  to  view 
points  as  laying  on  rays  emcuiating  from  the  center  of 
projection.  A  ray  representation  is  also  called  the  homo¬ 
geneous  coordinates  representation  of  the  plane,  and  is 
achieved  by  adding  a  third  coordinate.  Two  vectors  rep¬ 
resent  the  same  point  X  =  (x,  y,  z)  if  they  differ  at  most 
by  a  scale  factor  (different  locations  along  the  same  ray). 
A  key  result,  which  makes  this  representation  amenable 
to  appUcation  of  linear  algebra  to  geometry,  is  described 
in  the  following  proposition; 

Proposition  5  A  projectivity  of  the  plane  is  equivalent 
to  a  linear  transformation  of  the  homogeneous  represen¬ 
tation. 

The  proof  is  omitted  here,  and  can  be  found  in  Tuller 
(1967,  Theorems  5.22,  5.24).  A  projectivity  is  equiv¬ 
alent,  therefore,  to  a  lineeir  transformation  applied  to 
the  rays.  Because  the  correspondence  between  points 
and  coordinates  is  not  one-to-one,  we  have  to  t2dce  scalar 
factors  of  proportionality  into  account  when  represent¬ 
ing  a  projective  transformation.  An  arbitrary  projective 
transformation  of  the  plane  can  be  represented  as  a  non¬ 
singular  linear  trainsformation  (also  called  collineation) 
pX'  =  TX,  where  p  is  an  arbitrary  scale  factor. 

Given  four  corresponding  rays  pj  =  (x;,j/j,  1)  < — ► 
p'j  =  {xj,yj,  1),  we  would  like  to  find  a  linear  transfor¬ 
mation  T  and  the  scalars  pj  such  that  pjp'j  =  Tpj.  Note 
that  because  only  ratios  are  involved,  we  can  set  p4  =  1. 
The  following  are  a  basic  lemma  and  theorem  adapted 
from  Semple  and  Kneebone  (1952). 

Lemma  1  Ifp\,...,p^  are  four  vectors  in  no  three  of 
which  are  linearly  dependent,  and  ifei,...,e^  are  respec¬ 
tively  the  vecfors  (1,0,0), (0, 1,0),  (0,0, 1),  (1, 1, 1),  there 
exists  a  non-singular  linear  transformation  A  such  that 
Atj  =  ^jPj,  where  the  Xj  are  non-zero  scalars;  and  the 
matrices  of  any  two  transformations  with  this  property 
differ  at  most  by  a  scalar  factor. 

Proof:  Let  pj  have  the  components  (xj,yj  ,1),  and  with¬ 
out  loss  of  generality  let  A4  =  1.  The  matrix  A  satisfies 
three  conditions  Acj  =  XjPj,  j  =  1,2,3  if  and  only  if 
XjPj  is  the  j’th  column  of  A.  Because  of  the  fourth  con- 
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dition,  the  values  Ai,  A2,  A3  satisfy 

bl.P2.P3]  1  A2  )  =  P4 

V  A3  / 

and  since,  by  hypothesis  of  linear  independence  of 
Pi.P2.P3i  the  matrix  b1.P2.P3]  is  non-singular,  the  Aj 
are  uniquely  determined  and  non-zero.  The  matrix  A  is 
therefore  determined  up  to  a  scalar  factor.  [] 

Theorem  4  //pi,...,p4  and  p\,.  -,P4  are  two  sets  of 
four  vectors  in  P? ,  no  three  vectors  in  either  set  be¬ 
ing  linearly  dependent,  there  exists  a  non-singular  linear 
transformation  T  such  that  Tpj  =  pjPj  (j  =  1,...,4), 
where  the  pj  are  scalars;  and  the  matnx  T  is  uniquely 
determined  apart  from  a  scalar  factor. 

Proof:  By  the  lemma,  we  can  solve  for  A  and  Xj  that 
satisfy  Aej  =  Xjpj  {j  =  1,...,4),  and  similarly  we  can 
choose  B  and  pj  to  satisfy  Bcj  =  pjp' ;  and  without  loss 
of  generality  assume  that  A4  =  /14  =  1.  We  then  have, 
T  =  BA~^  and  pj  =  If,  further,  Tpj  =  pjp'j  and 

Upj  =  (XjP'jy  then  TAcj  =  PjXjp'j  and  U Acj  =  ajXjj/j\ 
and  therefore,  by  the  lemma,  TA  =  rUA,  i.e.,  T  =  rtl 
for  some  scalar  r.  [] 

The  immediate  implication  of  the  theorem  is  that  one 
can  solve  directly  for  T  and  pj  (ps  =  1).  Four  points 
provide  twelve  equations  and  we  have  twelve  unknowns 
(nine  for  T  and  three  for  pj).  Furthermore,  because  the 
system  is  linear,  one  can  look  for  a  least  squares  solu¬ 
tion  by  using  more  than  four  corresponding  points  (they 
all  have  to  be  coplanar);  each  additional  point  provides 
three  more  equations  and  one  more  unknown  (the  p  as¬ 
sociated  with  it). 

Alternatively,  one  can  eliminate  pj  from  the  equations, 
set  T3,3  =  1  and  set  up  directly  a  system  of  eight  lin¬ 
ear  equations  as  follows.  In  general  we  have  four  cor¬ 
responding  rays  pj  =  {xj,yj,Zj)  < — ►  p'j  =  (xJ,j^,^<), 
j  =  1,...,4,  and  the  linear  transformation  T  satisnes 
PjPj  =Tpj.  By  eliminating  pj,  each  pair  of  correspond¬ 
ing  rays  contributes  the  following  two  linear  equations: 


Xiiu  +  nh.,  +  =  2^1 

Zj  Zj  Zj 

Xjlfi  yjVj  Zjy'i 

^jh,l  +  yjt2,2  +  2jt2fi - - 7^*3, 2  =  — 


/  /  / 


Figure  12;  Setting  a  projectivity  under  parallel  projec¬ 
tion. 

The  standard  way  to  proceed  is  to  assume  that  both 
image  planes  are  parallel  to  their  xy  plane  with  a  focal 
length  of  one  unit,  or  in  other  words  to  embed  the  im¬ 
age  coordinates  in  a  3D  vector  whose  third  component 
is  1.  Let  Pj  -  {iXj,yj,\)  and  p'j  =  1)  be  the  the 

chosen  representation  of  image  points.  The  true  coordi¬ 
nates  of  those  image  points  may  be  different  (if  the  image 
plane  are  in  different  positions  than  assumed),  but  the 
main  point  is  that  all  such  representations  are  projec- 
tively  equivalent  to  each  other.  Therefore,  pjPj  —  Bpj 
and  pjp'j  =  Cpj,  where  pj  and  pj  are  the  true  image 
coordinates  of  these  points.  If  T  is  the  projective  trans¬ 
formation  determined  by  the  four  corresponding  points 
Pj  * — ►  pj,  then  A  =  CTB~^  is  the  projective  transfor¬ 
mation  between  the  assumed  representations  pj  < — » p' . 

Therefore,  the  matrix  A  can  be  solved  for  directly 
from  the  correspondences  pj  < — ►  p'  (the  system  of 
eight  equations  detailed  in  the  previous  section).  For 
any  given  point  p  =  (z,y,  1),  the  corresponding  point 
p'  =  {x',  p ,  1)  is  determined  by  Ap  followed  by  normal¬ 
ization  to  set  the  third  component  back  to  1. 


A  similar  pair  of  equations  can  be  derived  in  the  case 
z'j  =  0  (ideal  points)  by  using  either  x'  or  pj  (all  three 
cannot  be  zero). 

Projectivity  Between  Two  image  Planes  of  an 
UncaUbrated  Camera 

We  can  use  the  fundamental  theorem  of  plane  pro¬ 
jectivity  to  recover  the  projective  transformation  that 
was  illustrated  geometrically  in  Figure  11.  Given  four 
corresponding  points  {xj ,  yj )  « — *  (x'- ,  pj )  that  are  pro¬ 
jected  from  four  coplanar  points  in  space  we  would  like 
to  find  the  projective  tra^ormation  A  that  accounts 
for  ail  other  correspondences  (x,y)  < — *  {x' ,p)  that  are 
projected  from  coplanar  points  in  space. 


A.l  Plane  Projectivity  in  Affine  Geometry 

In  parallel  projection  we  can  take  advantage  of  the  fact 
that  parallel  lines  project  to  parallel  lines.  This  allows  to 
define  coordinates  on  the  plane  by  subtending  lines  par¬ 
allel  to  the  axes  (see  Figure  12).  Note  ako  that  the  two 
trapezoids  BB'p^px  and  BB'C'C  are  similar  trapezoids, 
therefore, 

BC  _  B'C 
PrC  -  PxC  • 

This  provides  a  geometric  derivation  of  the  result  that 

three  points  are  sufficient  to  set  up  a  projectivity  be^ 

tween  any  two  planes  under  parallel  projection. 

Algebraically,  a  projectivity  of  the  plane  can  be 
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Figure  13:  The  cross-ratio  of  four  distinct  concurrent 
rays  i«  equal  to  the  cross-ratio  of  the  four  distinct  points 
that  result  from  intersecting  the  rays  by  a  transversal. 

uniquely  represented  as  a  2D  affine  transformation  of  the 
non-homogeneous  coordinates  of  the  points.  Namely,  if 
p=  (x,y)  and  p'  =  (x',  j/)  are  two  corresponding  points, 
then 

p'  =  Ap  -1-  w 

where  A  is  a  non-singular  matrix  and  ti;  is  a  vector.  The 
six  parameters  of  the  transformation  can  be  recovered 
from  two  non-collinear  sets  of  three  points,  Po,Pi,P2  and 
Po.p'i.p'2-  Let 

^  _  f  *1  -  *2  -  <  Xi  -  Xo,  X2  -  Xo  1 

y'l  -  Po.  y'i-y'o  yi  -yo,y2-yo 

and  U)  =  Po  -  Apo,  which  together  satisfy  p'  —  p^  = 
A(pj  —  po)  for  j  =  1,2.  For  any  arbitrary  point  p  on 
the  plane,  we  have  that  p  is  spanned  by  the  two  vectors 
Pi-Po  andp2-p<„  i.e.,p  =  ai(pi-po)+02(p2-p<,);  and 
because  translation  in  depth  is  lost  in  parallel  projection, 
we  have  that  p'  =  c»i(p'i  “Pi)+Q:2(P2“P<>))  therefore 
P'  -  Po  =  Mp  -  Po)- 

B  Cross-Ratio  and  the  Linear 
Combination  of  Rays 

The  cross- ratio  of  four  collinear  points  A,B,C,D  is  pre¬ 
served  under  central  projection  and  is  defined  as: 


The  cross-ratio  of  rays  is  computed  algebraiceilly 
through  linear  combination  of  points  in  homogeneous 
coordinates  (see  Cans  1969,  pp.  291-295),  as  follows. 
Let  the  the  rays  a,b,c,d  be  represented  by  vectors 
(01,02,03),  .,(di,d2, da),  respectively.  We  can  repre¬ 
sent  the  rays  a,<i  as  a  linear  combination  of  the  rays 
6,c,  by 

a=  b  +  kc 
d=b+k'c 


For  example,  k  can  be  found  by  solving  the  linear  system 
of  three  equation  pa  =  b  +  kc  with  two  unknowns  p,  k 
(one  can  solve  using  any  two  of  the  three  equations,  or 
find  a  least  squares  solution  using  all  three  equations). 
We  shall  assume,  first,  that  the  points  are  Euclidean. 
The  ratio  in  which  A  divides  the  line  BC  can  be  derived 
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Similarly,  we  have  =  —k'j^  and,  therefore,  the  cross¬ 
ratio  of  the  four  rays  is  a  =  p.  The  same  result  holds 
under  more  general  conditions,  i.e.,  points  can  be  ideal 
as  well; 


Propositions  If  A,B,C,D  are  distinct  collinear 
points,  with  homogeneous  coordinates  b+kc,b,c,b+k'c, 
then  the  canonical  cross-ratio  is  p. 

(for  a  complete  proof,  see  Cans  1969,  pp.  294-295).  For 
our  purposes  it  is  sufficient  to  consider  the  case  when 
one  of  the  points,  say  the  vector  d,  is  ideal  (i.e.  ds  =  0). 
From  the  vector  equation  pd  =  b  +  k'c,  we  have  that 
k'  =  and,  therefore,  the  ratio  ^  =  1.  As  a  result, 
the  cross-ratio  is  determined  only  by  the  first  term,  i.e., 
a  =  ^  =  k  —  which  is  what  we  would  expect  if  we 
represented  points  in  the  Euclidean  plzuie  and  allowed 
the  point  D  to  extend  to  infinity  tdong  the  line  A,  B,C,D 
(see  Figure  13). 

The  derivation  so  far  can  be  translated  directly  to  our 
purposes  of  computing  the  projective  shape  constant  by 
replacing  a,b,c,d  with  j/  ,p  ,Vi,  respectively. 


C  On  Epipolar  Transformations 


AB  DB  A'B'  .  lyB' 
~  AC  '  DC  ~  A'C  ■  D'C  ’ 


Proposition  7  The  epipolar  lines  pVr  and  p'Vi  are  per- 
spectively  related. 


(see  Figure  13).  All  permutations  of  the  four  points 
are  allowed,  and  in  general  there  are  six  distinct  cross¬ 
ratios  that  can  be  computed  from  four  collinear  points. 
Because  the  cross-ratio  is  invariant  to  projection,  any 
transversal  meeting  four  distinct  concurrent  rays  in  four 
distinct  points  will  have  the  same  cross  ratio  —  therefore 
one  can  speak  of  the  cross-ratio  of  rays  (concurrent  or 
parallel)  a,  b,  c,  d. 

The  cross-ratio  result  in  terms  of  rays,  rather  than 
points,  is  appealing  for  the  reasons  that  it  enables  the  ap¬ 
plication  of  linear  algebra  (rays  are  represented  as  points 
in  homogeneous  coordinates),  and  more  important,  en¬ 
ables  us  to  treat  ideal  points  as  any  other  point  (critical 
for  having  an  algebraic  system  that  is  well  defined  under 
both  central  and  parallel  projection). 


Proof:  Consider  Figure  14.  We  have  already  estab¬ 
lished  that  p  projects  onto  the  left  epipolar  line  p'Vj. 
By  definition,  the  right  epipole  Vr  projects  onto  the  left 
epipole  Vi,  therefore,  because  lines  are  projective  invari¬ 
ants  the  line  pVr  projects  onto  the  line  p'V).  [] 

The  result  that  epipolar  lines  in  one  image  are  per- 
spectively  related  to  the  epipolar  lines  in  the  other  in>- 
age,  implies  that  there  exists  a  projective  transformation 
F  that  maps  epipolar  lines  Ij  onto  epipolztr  lines  /) ,  that 
is  Flj  =  pjVj,  where  Ij  =pj  xVr  and  /)  =  p)  x  V).  From 
the  property  of  point/line  duality  of  projective  geome¬ 
try  (Semple  and  Kneebone,  1952),  the  transformation 
E  that  maps  points  on  left  epipolar  lines  onto  points  on 
the  corresponding  right  epipolar  lines  is  induced  from  F, 
i.e.,f;=(F-i)‘. 


» 


Figure  14;  Epipolar  lines  are  perspeclively  related. 


Proposition  8  (point/line  duality)  The 
transformaUon  for  projecting  p  onto  the  left  eptpolar  line 
p'V,.  IS  E  = 

Proof:  Let  1,1'  be  corresponding  epipolar  lines,  related 
by  the  equation  pi'  =  FI.  Let  p,p'  be  any  two  points, 
one  on  each  epipolar  line  (not  necessarily  corresponding 
points).  From  the  point /line  incidence  axiom  we  have 
that  '  p  =  0.  By  substituting  I  we  have 

[pF-‘/f.p=0  =>  pi'*  ■[F-*p]=0. 

Therefore,  the  collineation  E  =  (/’"*)*  maps  points  p 
onto  the  corresponding  left  epipolar  line.  Q 

It  is  intuitively  clear  that  the  epipolar  line  transforma¬ 
tion  F  is  not  unique,  and  therefore  the  induced  trans¬ 
formation  E  is  not  unique  either.  The  correspondence 
between  the  epipolar  lines  is  not  disturbed  under  trans¬ 
lation  along  the  line  ViVj,  or  under  non-rigid  camera 
motion  that  results  from  tilting  the  image  plane  with  re¬ 
spect  to  the  optical  axis  such  that  the  epipole  remains 
on  the  line  ViVj. 

Proposition  9  The  epipolar  transformation  F  is  not 
unique. 

Proof:  A  projective  transformation  is  determined 
by  four  corresponding  pencils.  The  transformation  is 
unique  (up  to  a  scale  factor)  if  no  three  of  the  pencils  are 
linearly  dependent,  i.e.,  if  the  pencils  are  lines,  then  no 
three  of  the  four  lines  should  be  coplanar.  The  epipolar 
line  transformation  F  can  be  determined  by  the  corre¬ 
sponding  epipoles,  Vr  *—*  Vi,  and  three  corresponding 
epipolar  lines  Ij  « — ►  I'j.  We  show  next  that  the  epipolar 
lines  are  coplanar,  and  therefore,  F  cannot  be  deter¬ 
mined  uniquely. 

Let  pj  and  p'  ,  j  =  1,2,3,  be  three  corresponding 
points  and  Jet  Ij  =  pj  x  Vr  and  Ij  =  pj  x  Vi.  Let 
P3  =  api  -1-  0P2,  a  +  p  =  1,  be  a  point  on  the  epipo¬ 
lar  line  psVr  collinear  with  pi.pj.  We  have, 

^3  =  P3  X  K-  =  (apa  +  tv;) X  Vr  =  apsxVr  =  aali  -l-a/J/j, 

and  similarly  ('3  =  o'/'j  -I-  /S'/j.  [] 


Figure  15:  See  text. 


The  epipolar  transformation,  therefore,  has  three  free 
parameters  (one  for  scale,  the  other  two  because  the 
equation  FI3  =  P3/3  has  dropped  out). 

D  Afline  Structure  in  Projective  Space 

Proposition  10  The  ajfine  structure  invariant,  based 
on  a  single  reference  plane  and  a  reference  point,  cannot 
be  directly  extended  to  central  projection. 

Proof:  Consider  the  drawing  in  Figure  15.  Let  Q  be 
the  reference  point,  P  be  an  arbitrary  point  of  interest 
in  space,  and  Q,P  be  the  projection  of  Q  and  P  onto 
the  reference  plane  (see  section  4  for  definition  of  affine 
structure  under  parallel  projection). 

The  relationship  between  the  points  P,Q,P,Q  and 
the  points  pf  ,p ,q' can  be  described  as  a  perspectivity 
between  two  triangles.  However,  in  order  to  establish 
an  invariant  between  the  two  triangles  one  must  have  a 
coplanar  point  outside  each  of  the  triangles,  therefore  the 
five  corresponding  points  are  not  sufficient  for  determin¬ 
ing  an  invariant  relation  (this  is  known  as  the  ‘five  point 
invariant’  which  requires  that  no  three  of  the  points  be 
collinear)  .Q 

E  On  the  Intersection  of  Epipolar  Lines 

Barret  et  al.  (1991)  derive  a  quadratic  invariant  based 
on  Longuet-Higgins’  fundamental  matrix.  We  describe 
briefly  their  invariant  and  show  that  it  is  equivalent  to 
performing  re-projection  using  intersection  of  epipolar 
lines. 

In  section  8  we  derived  Longuet-Higgins’  fundamental 
matrix  relation  p**  Hp  =  0.  Barret  ei  al.  note  that  the 
equation  can  be  written  in  vector  form  h*  q  =  0,  where 
h  contains  the  elements  of  H  and 

q  =  {x'x,  x'y,  x',  y'x,  y'y,  y',  x,y,l). 
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Therefore,  the  matrix 


B  = 


9i 


L  49  J 


[5]  T.  Broida,  S.  Chandrashekhar,  and  R.  Chellapa.  re¬ 
cursive  3-d  motion  estimation  from  a  monocular  im¬ 
age  sequence.  IEEE  Transactions  on  Aerospace  and 
Electronic  Systems,  26:639-656,  1990. 

[6]  D.C.  Brown.  Close-range  camera  calibration.  Pho- 
togrammetric  Engineering,  37:855-866,  1971. 


must  have  a  vanishing  determinant.  Given  eight  corre¬ 
sponding  points,  the  condition  |B|  =  0  leads  to  a  con¬ 
straint  line  in  terms  of  the  coordinates  of  any  ninth  point, 
i.e.,  ax  +  0y  +  y  =0.  The  location  of  the  ninth  point 
in  any  third  view  can,  therefore,  be  determined  by  inter¬ 
secting  the  constraint  lines  derived  from  views  1  and  3, 
and  views  2  and  3. 

Another  way  of  deriving  this  re-projection  method  is 
by  first  noticing  that  if  is  a  correlation  that  maps  p  onto 
the  corresponding  epipolar  line  /'  =  Vj  xp'  (see  section  8). 
Therefore,  from  views  1  and  3  we  have  the  relation 

p"‘Rp=0, 

and  from  views  2  and  3  we  have  the  relation 
p"'Hp'  =  0, 

where  Hp  and  Hjf  are  two  intersecting  epipolar  lines. 
Given  eight  corresponding  points,  we  can  recover  H  and 
H.  The  location  of  any  ninth  pmnt  p"  can  be  recovered 
by  intersecting  the  lines  Hp  and  Hp'. 

This  way  of  deriving  the  re-projection  method  has  an 
advantage  over  using  the  condition  |B|  =  0  directly, 
because  one  can  use  more  than  eight  points  in  a  least 
squares  solution  (via  SVO)  for  the  matrices  H  and  H. 

Approaching  the  re-projection  problem  using  intersec¬ 
tion  of  epipolar  lines  is  problematic  for  novel  views  that 
have  a  similar  epipolar  geometry  to  that  of  the  two  model 
views  (these  are  situations  where  the  two  lines  Hp  and 
Hj/  are  nearly  parallel,  such  as  when  the  object  rotates 
around  nearly  the  same  axis  for  all  views).  We  there¬ 
fore  expect  sensitivity  to  errors  also  under  conditions  of 
smaU  separation  between  views.  The  method  becomes 
more  practical  if  one  uses  multiple  model  views  instead 
of  only  two,  because  each  model  views  adds  one  epipolar 
line  and  all  lines  should  intersect  at  the  location  of  the 
point  of  interest  in  the  novel  view. 
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